Pesquisa | Portal Regional da BVS

FLAP: a framework for linking free-text addresses to the Ordnance Survey Unique Property Reference Number database.

Zhang, Huayu; Casey, Arlene; Guellil, Imane; Suárez-Paniagua, Víctor; MacRae, Clare; Marwick, Charis; Wu, Honghan; Guthrie, Bruce; Alex, Beatrice.

Front Digit Health ; 5: 1186208, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38090654

RESUMO

Introduction: Linking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set. Methods: In this article, we propose a generalisable solution (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database, FLAP) based on a machine learning-based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at Link). We tested the framework in a real-world scenario of linking individual's (n=771,588) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB). Results: We achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (n=3,876) from NHS Tayside and NHS Fife CHI addresses. FLAP showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top n candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement. Discussion: In conclusion, we have developed a framework, FLAP, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task.

Ontology-driven and weakly supervised rare disease identification from clinical notes.

Dong, Hang; Suárez-Paniagua, Víctor; Zhang, Huayu; Wang, Minhong; Casey, Arlene; Davidson, Emma; Chen, Jiaoyan; Alex, Beatrice; Whiteley, William; Wu, Honghan.

BMC Med Inform Decis Mak ; 23(1): 86, 2023 05 05.

Artigo em Inglês | MEDLINE | ID: mdl-37147628

RESUMO

BACKGROUND: Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. METHODS: We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations. RESULTS: The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). CONCLUSION: The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies.

Assuntos

Processamento de Linguagem Natural , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Aprendizado de Máquina , Unified Medical Language System , Classificação Internacional de Doenças

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.

Dong, Hang; Suarez-Paniagua, Victor; Zhang, Huayu; Wang, Minhong; Whitfield, Emma; Wu, Honghan.

Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2294-2298, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34891745

RESUMO

The identification of rare diseases from clinical notes with Natural Language Processing (NLP) is challenging due to the few cases available for machine learning and the need of data annotation from clinical experts. We propose a method using ontologies and weak supervision. The approach includes two steps: (i) Text-to-UMLS, linking text mentions to concepts in Unified Medical Language System (UMLS), with a named entity linking tool (e.g. SemEHR) and weak supervision based on customised rules and Bidirectional Encoder Representations from Transformers (BERT) based contextual representations, and (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). Using MIMIC-III US intensive care discharge summaries as a case study, we show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts. Our analysis shows that the overall pipeline processing discharge summaries can surface rare disease cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.

Assuntos

Prontuários Médicos , Doenças Raras , Unified Medical Language System , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Doenças Raras/diagnóstico

The reporting quality of natural language processing studies: systematic review of studies of radiology reports.

Davidson, Emma M; Poon, Michael T C; Casey, Arlene; Grivas, Andreas; Duma, Daniel; Dong, Hang; Suárez-Paniagua, Víctor; Grover, Claire; Tobin, Richard; Whalley, Heather; Wu, Honghan; Alex, Beatrice; Whiteley, William.

BMC Med Imaging ; 21(1): 142, 2021 10 02.

Artigo em Inglês | MEDLINE | ID: mdl-34600486

RESUMO

BACKGROUND: Automated language analysis of radiology reports using natural language processing (NLP) can provide valuable information on patients' health and disease. With its rapid development, NLP studies should have transparent methodology to allow comparison of approaches and reproducibility. This systematic review aims to summarise the characteristics and reporting quality of studies applying NLP to radiology reports. METHODS: We searched Google Scholar for studies published in English that applied NLP to radiology reports of any imaging modality between January 2015 and October 2019. At least two reviewers independently performed screening and completed data extraction. We specified 15 criteria relating to data source, datasets, ground truth, outcomes, and reproducibility for quality assessment. The primary NLP performance measures were precision, recall and F1 score. RESULTS: Of the 4,836 records retrieved, we included 164 studies that used NLP on radiology reports. The commonest clinical applications of NLP were disease information or classification (28%) and diagnostic surveillance (27.4%). Most studies used English radiology reports (86%). Reports from mixed imaging modalities were used in 28% of the studies. Oncology (24%) was the most frequent disease area. Most studies had dataset size > 200 (85.4%) but the proportion of studies that described their annotated, training, validation, and test set were 67.1%, 63.4%, 45.7%, and 67.7% respectively. About half of the studies reported precision (48.8%) and recall (53.7%). Few studies reported external validation performed (10.8%), data availability (8.5%) and code availability (9.1%). There was no pattern of performance associated with the overall reporting quality. CONCLUSIONS: There is a range of potential clinical applications for NLP of radiology reports in health services and research. However, we found suboptimal reporting quality that precludes comparison, reproducibility, and replication. Our results support the need for development of reporting standards specific to clinical NLP studies.

Assuntos

Processamento de Linguagem Natural , Radiografia , Radiologia/normas , Conjuntos de Dados como Assunto , Humanos , Reprodutibilidade dos Testes , Relatório de Pesquisa/normas

A systematic review of natural language processing applied to radiology reports.

Casey, Arlene; Davidson, Emma; Poon, Michael; Dong, Hang; Duma, Daniel; Grivas, Andreas; Grover, Claire; Suárez-Paniagua, Víctor; Tobin, Richard; Whiteley, William; Wu, Honghan; Alex, Beatrice.

BMC Med Inform Decis Mak ; 21(1): 179, 2021 06 03.

Artigo em Inglês | MEDLINE | ID: mdl-34082729

RESUMO

BACKGROUND: Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports. METHODS: We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. RESULTS: We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. CONCLUSIONS: Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.

Assuntos

Sistemas de Informação em Radiologia , Radiologia , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Reprodutibilidade dos Testes

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.

Dong, Hang; Suárez-Paniagua, Víctor; Whiteley, William; Wu, Honghan.

J Biomed Inform ; 116: 103728, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33711543

RESUMO

BACKGROUND: Diagnostic or procedural coding of clinical notes aims to derive a coded summary of disease-related information about patients. Such coding is usually done manually in hospitals but could potentially be automated to improve the efficiency and accuracy of medical coding. Recent studies on deep learning for automated medical coding achieved promising performances. However, the explainability of these models is usually poor, preventing them to be used confidently in supporting clinical practice. Another limitation is that these models mostly assume independence among labels, ignoring the complex correlations among medical codes which can potentially be exploited to improve the performance. METHODS: To address the issues of model explainability and label correlations, we propose a Hierarchical Label-wise Attention Network (HLAN), which aimed to interpret the model by quantifying importance (as attention weights) of words and sentences related to each of the labels. Secondly, we propose to enhance the major deep learning models with a label embedding (LE) initialisation approach, which learns a dense, continuous vector representation and then injects the representation into the final layers and the label-wise attention layers in the models. We evaluated the methods using three settings on the MIMIC-III discharge summaries: full codes, top-50 codes, and the UK NHS (National Health Service) COVID-19 (Coronavirus disease 2019) shielding codes. Experiments were conducted to compare the HLAN model and label embedding initialisation to the state-of-the-art neural network based methods, including variants of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). RESULTS: HLAN achieved the best Micro-level AUC and F1 on the top-50 code prediction, 91.9% and 64.1%, respectively; and comparable results on the NHS COVID-19 shielding code prediction to other models: around 97% Micro-level AUC. More importantly, in the analysis of model explanations, by highlighting the most salient words and sentences for each label, HLAN showed more meaningful and comprehensive model interpretation compared to the CNN-based models and its downgraded baselines, HAN and HA-GRU. Label embedding (LE) initialisation significantly boosted the previous state-of-the-art model, CNN with attention mechanisms, on the full code prediction to 52.5% Micro-level F1. The analysis of the layers initialised with label embeddings further explains the effect of this initialisation approach. The source code of the implementation and the results are openly available at https://github.com/acadTags/Explainable-Automated-Medical-Coding. CONCLUSION: We draw the conclusion from the evaluation results and analyses. First, with hierarchical label-wise attention mechanisms, HLAN can provide better or comparable results for automated coding to the state-of-the-art, CNN-based models. Second, HLAN can provide more comprehensive explanations for each label by highlighting key words and sentences in the discharge summaries, compared to the n-grams in the CNN-based models and the downgraded baselines, HAN and HA-GRU. Third, the performance of deep learning based multi-label classification for automated coding can be consistently boosted by initialising label embeddings that captures the correlations among labels. We further discuss the advantages and drawbacks of the overall method regarding its potential to be deployed to a hospital and suggest areas for future studies.

Assuntos

COVID-19 , Codificação Clínica/métodos , Redes Neurais de Computação , SARS-CoV-2 , COVID-19/epidemiologia , Codificação Clínica/estatística & dados numéricos , Aprendizado Profundo , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Informática Médica , Pandemias/estatística & dados numéricos , Reino Unido/epidemiologia

A two-stage deep learning approach for extracting entities and relationships from medical texts.

Suárez-Paniagua, Víctor; Rivera Zavala, Renzo M; Segura-Bedmar, Isabel; Martínez, Paloma.

J Biomed Inform ; 99: 103285, 2019 11.

Artigo em Inglês | MEDLINE | ID: mdl-31546016

RESUMO

This work presents a two-stage deep learning system for Named Entity Recognition (NER) and Relation Extraction (RE) from medical texts. These tasks are a crucial step to many natural language understanding applications in the biomedical domain. Automatic medical coding of electronic medical records, automated summarizing of patient records, automatic cohort identification for clinical studies, text simplification of health documents for patients, early detection of adverse drug reactions or automatic identification of risk factors are only a few examples of the many possible opportunities that the text analysis can offer in the clinical domain. In this work, our efforts are primarily directed towards the improvement of the pharmacovigilance process by the automatic detection of drug-drug interactions (DDI) from texts. Moreover, we deal with the semantic analysis of texts containing health information for patients. Our two-stage approach is based on Deep Learning architectures. Concretely, NER is performed combining a bidirectional Long Short-Term Memory (Bi-LSTM) and a Conditional Random Field (CRF), while RE applies a Convolutional Neural Network (CNN). Since our approach uses very few language resources, only the pre-trained word embeddings, and does not exploit any domain resources (such as dictionaries or ontologies), this can be easily expandable to support other languages and clinical applications that require the exploitation of semantic information (concepts and relationships) from texts. During the last years, the task of DDI extraction has received great attention by the BioNLP community. However, the problem has been traditionally evaluated as two separate subtasks: drug name recognition and extraction of DDIs. To the best of our knowledge, this is the first work that provides an evaluation of the whole pipeline. Moreover, our system obtains state-of-the-art results on the eHealth-KD challenge, which was part of the Workshop on Semantic Analysis at SEPLN (TASS-2018).

Assuntos

Mineração de Dados/métodos , Aprendizado Profundo , Registros Eletrônicos de Saúde/classificação , Codificação Clínica , Interações Medicamentosas , Humanos

Evaluation of pooling operations in convolutional architectures for drug-drug interaction extraction.

Suárez-Paniagua, Víctor; Segura-Bedmar, Isabel.

BMC Bioinformatics ; 19(Suppl 8): 209, 2018 06 13.

Artigo em Inglês | MEDLINE | ID: mdl-29897318

RESUMO

BACKGROUND: Deep Neural Networks (DNN), in particular, Convolutional Neural Networks (CNN), has recently achieved state-of-art results for the task of Drug-Drug Interaction (DDI) extraction. Most CNN architectures incorporate a pooling layer to reduce the dimensionality of the convolution layer output, preserving relevant features and removing irrelevant details. All the previous CNN based systems for DDI extraction used max-pooling layers. RESULTS: In this paper, we evaluate the performance of various pooling methods (in particular max-pooling, average-pooling and attentive pooling), as well as their combination, for the task of DDI extraction. Our experiments show that max-pooling exhibits a higher performance in F1-score (64.56%) than attentive pooling (59.92%) and than average-pooling (58.35%). CONCLUSIONS: Max-pooling outperforms the others alternatives because is the only one which is invariant to the special pad tokens that are appending to the shorter sentences known as padding. Actually, the combination of max-pooling and attentive pooling does not improve the performance as compared with the single max-pooling technique.

Assuntos

Algoritmos , Interações Medicamentosas , Armazenamento e Recuperação da Informação , Bases de Dados como Assunto , Aprendizado Profundo , Humanos , Redes Neurais de Computação

Exploring convolutional neural networks for drug-drug interaction extraction.

Suárez-Paniagua, Víctor; Segura-Bedmar, Isabel; Martínez, Paloma.

Database (Oxford) ; 20172017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-28605776

RESUMO

Drug-drug interaction (DDI), which is a specific type of adverse drug reaction, occurs when a drug influences the level or activity of another drug. Natural language processing techniques can provide health-care professionals with a novel way of reducing the time spent reviewing the literature for potential DDIs. The current state-of-the-art for the extraction of DDIs is based on feature-engineering algorithms (such as support vector machines), which usually require considerable time and effort. One possible alternative to these approaches includes deep learning. This technique aims to automatically learn the best feature representation from the input data for a given task. The purpose of this paper is to examine whether a convolutional neural network (CNN), which only uses word embeddings as input features, can be applied successfully to classify DDIs from biomedical texts. Proposed herein, is a CNN architecture with only one hidden layer, thus making the model more computationally efficient, and we perform detailed experiments in order to determine the best settings of the model. The goal is to determine the best parameter of this basic CNN that should be considered for future research. The experimental results show that the proposed approach is promising because it attained the second position in the 2013 rankings of the DDI extraction challenge. However, it obtained worse results than previous works using neural networks with more complex architectures.

Assuntos

Mineração de Dados/métodos , Interações Medicamentosas , Processamento Eletrônico de Dados/métodos , Modelos Teóricos , Processamento de Linguagem Natural , Redes Neurais de Computação , Máquina de Vetores de Suporte

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA